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Inspired by recent experiments on DNA replication, we apply a one-dimensional nucleation-and- 
growth model to DNA-replication kinetics, focusing on how to extract the time-dependent nucleation 
rate I{t) and growth speed v from data. We discuss generic experimental problems, namely spatial 
inhomogeneity, measurement noise, and finite-size effects. After evaluating how each of these affects 
the measurements of I{t) and v, we give guidelines for the design of experiments. These ideas are 
then discussed in the context of the DNA-replication experiments. 
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I. INTRODUCTION 

Since its development in the late 1930s, the phe- 
nomenological model of nucleation and growth of Kol- 
mogorov, Johnson-Mehl, and Avrami (KJMA) has been 
widely applied to the analysis of kinetics of first-order 
phase transformations, mostly in two and three spatial 
dimensions 1, 2, 3]. The model has several exact results 
given the following basic assumptions: (1) The system 
is infinitely large and untransformed at time i=0; (2) 
nucleations occur stochastically, homogeneously, and in- 
dependently one from one another; (3) the transformed 
domains grow outward uniformly, keeping their shape; 
and (4) growing domains that impinge coalesce. 

Although the KJMA model is conceptually simple, ex- 
periments often have complicating factors that make the 
contact between theory and experiment delicate and lead 
to deviations from the basic model. For example, a prin- 
cipal result of the KJMA model is that the fraction /(t) 
of the transformed volume at time t is 

f{t)^l-e-^'\ (1) 

where A and a are constants: A depends upon the growth 
velocity w, the nucleation rate /, and the spatial dimen- 
sion D, while a is determined by / and D. In the liter- 
ature, a is called the Avrami exponent. "Avrami plots" 
of — ln[ln(l — /)] vs. ln< should thus be straight lines of 
slope a . Unfortunately, Eq. ^ often does not fit data 
well because the experimental conditions do not satisfy 
the assumptions of the KJMA theory 1^0,0]. For exam- 
ple, nucleation can be inhomogeneous or correlated 0,13; 
real systems are finite; and there is always measurement 
noise. 

In two- or three-dimensional systems, where only lim- 
ited theoretical results such as Eq. ^ are available, it 
can be difficult to pinpoint the origins of discrepancies 
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between experimental data and the predictions of the 
KJMA model. In one-dimensional systems, however, sev- 
eral scientists have shown since the 1980s that one can 
push the analysis much further than for the original ver- 
sion of the KJMA model J^Jl, 12]. 

In this paper, we shall show that a detailed theoretical 
understanding of the KJMA model in ID lets us compare 
theory and experiment more directly. In other words, 
we can extract the kinetic parameters from data under 
less-than-ideal experimental circumstances. Our discus- 
sion will be set in the context of recent DNA-replication 
experiments that have drawn attention from both the 
physics and biology communities 0, Q, Q^J . 

II. APPLICATION OF THE ID-KJMA MODEL 
TO EXPERIMENTAL SYSTEMS 

Although there are many analytical results for the ID- 
KJMA model, only a very few ID systems that are well- 
described this model have been identified (e.g. and 
very little detailed analysis has been done on those sys- 
tems. Recently, however, Herrick et al. have identified a 
formal analogy between the ID-KJMA model and DNA 
replication processes jl^. Equally important, they have 
developed experimental methods that can yield large 
quantities of data, allowing the extraction of detailed sta- 
tistical quantities. Since the DNA work provides a model 
system for testing the general experimental problems dis- 
cussed above, and also in order to fix the language, we 
begin by reviewing the mapping between DNA replica- 
tion and the KJMA model. 



1. Mapping DNA replication onto the KJMA model 

Although the organization of the genome for DNA 
replication varies considerably from species to species, 
the duplication of most eukaryotic genomes shares a num- 
ber of common features p^: 

1. DNA replication starts at a large number of sites 
known as "origins of replication." The DNA do- 
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FIG. 1; Mapping DNA replication onto the one-dimensional 
KJMA model. 



main replicated from each origin is referred to, in- 
formally, as an "eye" or a "replication bubble" be- 
cause of its appearance in electron microscopy. 

2. The position of each potential origin that is "com- 
petent" to initiate DNA replication is determined 
before the beginning of the synthesis part of the 
cell cycle ("S phase"), when several proteins, in- 
cluding the origin recognition complex (ORG) bind 
to DNA, forming a pre-replication complex (pre- 
RC). 

3. During S phase, a particular potential origin may 
or may not be activated. Each origin is activated 
not more than once during the cell-division cycle. 

4. DNA synthesis propagates at replication forks bidi- 
rectionally, with propagation speed or fork velocity 
I), from each activated origin. Experimentally, v is 
approximately constant throughout S phase. 

5. DNA synthesis stops when two newly replicated re- 
gions of DNA meet. 

From Fig. ^ it is apparent that processes 3-5 have 
a formal analogy with nucleation and growth in one di- 
mension. We identify (1) nucleation of islands as activa- 
tion (initiation) of replication origins; (2) growth of the 
eyes as growth of the islands; and (3) coalescence of two 
expanding eyes as the merging of growing islands. Of 
course, while DNA is topologically one dimensional, it is 
embodied in a three-dimensional space. 

In an ideal world, one could monitor the replication 
process continuously and compile domain statistics in 
real time. In the real world, the three billion DNA base- 
pairs (bps) of a typical higher eukaryote, which replicate 
in as many as ~10^ sites simultaneously, are packed in a 
cell nucleus of radius '--^l /xm, making a direct, real-time 
monitoring impossible 18]. Recently, experiments have 
used two-color fluorescent labe ling of DNA bases to study 
replication kinetics indirectly [l^. One begins (in a test 
tube) by labeling the bases used in replicating the DNA 
with, say, a red dye. At some time during the replica- 
tion process (e.g. t\ in Fig. P), one floods the test tube 
with green-labeled bases and allows the replication cycle 



to go to completion. One then stretches the DNA onto 
a glass slide ("molecular combing" jl^)) a process that 
unfortunately also breaks the DNA strands into finite 
segments. Under a microscope, regions that replicated 
before adding the dye are red, while those labeled af- 
terwards are predominantly green. The alternating red- 
and-green regions correspond to eyes and holes in Fig.^ 
forming a kind of snapshot of the replication state of the 
DNA fragment at the time the second dye was added. 
Each time point in Fig. ^ would thus correspond to a 
separate experiment. 

Using the formal analogy between DNA replication 
and ID nucleation-growth model, we can extract the ki- 
netic parameters /(f) and v from data 0. For the ideal 
case, the procedure is straightforward. For real-world 
data, on the other hand, one has to be cautious because of 
the generic problems explained above. We have already 
mentioned that the molecular combing process chops the 
DNA into finite-size segments, which effectively truncates 
the full statistics Another problem in the exper- 

imental protocols is that an in-vitro replication experi- 
ment usually has many different nuclei in the test tube. 
These nuclei start replication at different, unknown times 
and locations along the genome [isLIT^ . The asynchrony 
leads to sample heterogeneity and creates a starting-time 
distribution for the DNA replication Ts'l . Finally, the fi- 
nite resolution of the microscope used to measure domain 
sizes may affect the statistics. 

Below, we shall examine each of these complicating 
factors, present empirical criteria for their significance, 
and then discuss the implications of these criteria for the 
design of experiments. 

To set the stage, we begin with the problem of extract- 
ing experimental parameters from ideal data. 



2. Ideal case 

From the theoretician's point of view, a system can be 
said to be ideal when it satisfies all underlying assump- 
tions of the theory. In the context of DNA replication and 
the KJMA model, this means that the DNA molecule is 
infinitely long and that the initiation rate / of replication 
is homogeneous and uncorrelated. Also, statistics should 
be directly obtainable at any time point t at arbitrarily 
fine resolution. Because the growth velocity of replicated 
DNA domains has been measured to be approximately 
constant, we shall limit our analysis to this special case. 
One can then apply the KJMA model to a single exper- 
imental realization to extract kinetic parameters such as 
I{t) and V. 

In order to do t his, we note that the simulation in 
our previous paper |l2j| (hereafter. Paper I) is in practice 
such a case (system size = 10'', v = 0.5, dt — 0.1, I[t) — 
I ■ t, where / = 10~^). Using the theoretical results 
obtained in Paper I, we can find an expression to invert 
I{t) from data. For example, the domain density n{t) and 
the island fraction f(t) at time t, given a time-dependent 
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nucleation rate I{t) are [1^ 



n{t) 
fit) 



g(i)e"2-Jo9(*')<i*' 
1 - S{t) 
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-2v g{t')dt' 



In Eq.m g{t) = I{t')dt' , and S{t) is the hole fraction. 
Note that n{t)^^ is equal to the average island-to- island 
distance £i2i{t) at time t. On the other hand, the aver- 
age hole size £h{t) is S{t)/n{t) ~ g{t)^^. Since all three 
domains (island, hole, and island-to-island) have equal 
densities n{t) in one dimension, we have the following 
general relationship among them, which is valid even in 
the presence of correlations between domain sizes: 



fit) 



m+ih{t)' 



(3a) 
(3b) 



In other words, there are only two independent quantities 
among f (t) , £i{t) , lh{t) , ii2i{t) , and we can calculate ii{t) 
even if we do not know the exact expression for the island 
distribution pi{x, t): 



Ut) 



1 

W) 
1 

W) 



^2v j* git')dt' 



1] 
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Note that £i{t) [ih{t)] is a monotonically increasing (de- 
creasing) function of time, and therefore, Eq. 03 implies 
that ii2i{t) has a well-defined minimum. We emphasize 
that Eqs. 121 and 01 set the basic time and length scales, 
t* and i*, of the system. Because the KJMA model has 
essentially only one scale, it is simpler than other com- 
mon stochastic models in physics that lack an intrinsic 
scale and hence show fractal behavior (structure at all 
scales). Since /(<) is sigmoidal, varying from to 1, 
we define t* to be the time required for the system to 
reach / = 0.5. On the other hand, we define £* to be 
the minimum eye-to-eye (island-to-island) distance dur- 
ing the course of replication [see Fig. j^c) and (d)]. 

From Eqs. (21 and 01 it is straightfoward to invert the 
mean quantities to obtain the nucleation rate I{t) and 
the growth velocity v. 



lit) 



1 



dtth{t) 
1 \nS{t) 



dt' 



(5) 



Eq. can then be apphed to an ideal set of data, 
i.e., one for which noise- free measurements are made 
on infinitely long DNA. As Fig. [3 shows, we can re- 
cover the input parameters from simulation results in 
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FIG. 2: Parameter extraction from an almost ideal data set. 
(a) Inferred nucleation rate vs. time; (b) Velocity vs. time; (c) 
Average domain sizes vs. time; (d) Island fraction vs. time; 
theory and extracted f(t) overlap. In (c), f is the minimum 
average eye-to-to spacing, and sets the basic length scale. In 
(d), t* is the time at which 50% of the genome has replicated. 
It sets the basic time scale. 



Paper I accurately: the extracted parameters are / = 
(0.99 ± 0.04) X 10"^ and v = 0.50 ± 0.02. [The errors 
are the statistical errors from the curve fits in Figs.[21[a) 
and (b)]. We note that the fiuctuations visible for i > 75 
arise from using direct numerical differentiation in Eq. |S1 
One could reduce the noise by appropriate data process- 
ing, using for example a smoothing spline '20'|. However, 
because any data filtering is a delicate issue, and because 
direct numerical differentiation produced satisfactory re- 
sults, we have decided to forego any smoothing. 

We also note that there are statistical fluctuations re- 
lated to the finite-size of the system: as f(t) approaches 
1, the number of domains n{t) becomes very small; thus 
even small changes in n{t) can cause significant fiuctu- 
ations in average domain sizes. However, the finite-size 
effect in this case becomes visible only when the num- 
ber of new nucleations in each step, N{t), is roughly 1 
{t > 165 or / > 0.999). The effect can be ignored for 
N{t) ^ 1 for the practically infinite system considered 
here 

In the following sections, we consider the complications 
that arise from less-ideal experimental conditions. 



3. Asynchrony 

As we mentioned above, data often come from experi- 
ments where the DNA from many different independently 
replicating cells is simultaneously present in the same test 
tube. The individual DNA molecules begin replicating 
at different unknown starting times. In such cases, it is 
simpler to begin by sorting data by the replicated frac- 
tion / of the measured segment l2^. The basic idea is 
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FIG. 3: (Color online). Inversion results in the presence of asynchrony and finite-size effects, (a) I /2v vs. 2vt. The arrows 
indicate where / = 0.8 in / vs. t curves in (d) for three different molecule sizes: 10'* (unchopped), 1000 and 250 (chopped), 
(b) p{f,ti) for six time points 60, 80, 100, 120, 140, 160 (from left to right). The circles are simulation data; the solid lines are 
from Eq. using the extracted parameters in Table (c) Optimization results for the starting-time distribution (I>{t) . The 
solid line is a Gaussian fit. (d) / vs. 2vt for £c = 250 and £c ~ 1000. The solid line is the unchopped case (size 10"*). (e) 
Average domain sizes vs. /. The empty circles are for the unchopped case, while the dotted and dashed curves correspond to 
= 1000 and 250. (f) Plot of log [pif,^^] (arbitrary units) vs. V for size 10*. The complete fit results are shown in TableQ] 
See also text. 



that for spatially homogeneous replication (namely, nu- 
cleation and growth), all segments with a similar fraction 
/ are at roughly the same point in S phase. Since f{t) 
is a monotonically increasing function of t, we can essen- 
tially use / as our initial clock, leaving the conversion to 
real time t to a second step. 

Once the data have been sorted by /, we extract the 
initiation frequency / as a function of /. Using Eas. 12141 
one can straightforwardly obtain expressions analogous 
to Eq. El 

/(/) ^ 1 d 1 
2v i, + 4 df 4 

2vt{!) = I {l+Ih)df'. (6) 

In Eq.El £i and £h are functions of /. In other words, we 
have a direct inversion I /2v vs. 2vt from data [Fig.l^a)]. 
Note that both / and t are always accompanied by the 
factor 2v, which has to be determined independently (see 
below). On the other hand, the fluctuations in the ex- 
tracted I /2v are the result of direct numerical differenti- 
ation in Eq. discussed in the previous section. 

In the two-color labeling experiments, we can compile 
statistics into histograms of the distribution p(/, ti) of 



replicated fractions / at time ti [Fig. Ol^b)] , where ti is 
the timepoint where the second dye was added (Fig. 
Note that the spread in p{f,ti) is related to the starting- 
time distribution 4){t) via the kinetic curve f{t), where t 
is the laboratory time that each DNA starts replicating, 
and t is the duration of time since the onset of replication. 
Since (j){T)dT — p{f{t'),ti) ■ df{t'), where t' — ti — r, we 
obtain 

p(/,i.) = 0(r)x ) (7) 

\dT t=ti-T J 

For a Gaussian starting time distribution (/'(r), one can 
in principle fit all p(/, ti)'s using three fitting parameters 
V, the average starting time tq, and the starting time 
width (Jt . Unfortunately, this "brute-force" approach did 
not produce satisfactory results as the basin of attraction 
of the minimum proved to be relatively small. 

Our strategy then was first to obtain a coarse-grained 
V vs. global plot shown in Fig. 01 as follows: 

1. Guess a range of v between Vmin and Vmax- 

2. Fix V (starting from v = Wmm), and trace p{f,ti) 
back in time. For a specific value of / and time- 
point ti, the corresponding starting time is ti — t{f) 
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(Eq.E)). Repeat for all p{f, ii)'s and reconstruct the 
starting time distribution 4>{t). 

3. Fit 0(t) obtained in step 1 to an empirical model. 
(In the absence of correlations among starting 
times, a Gaussian distribution is a reasonable 
choice (2^. One may also know the rough form 
of (j){T) from an understanding of the origins of the 
asynchrony.) 

4. Regenerate p{f,ti) using Eq. [7| with the parame- 
ters obtained in steps 2 and 3. Calculate for 
p{f,ti). This is also a global fit, as the statistic 
is summed over data from all time points i^. 

5. Increase v to v+Av and repeat 2-4. If there is 
a well-defined minimum of the x^iv) (with corre- 
sponding To and (Tr) [e.g.. Fig. (S^f)], one can find 
a more accurate estimate of the minimum using 
a standard optimization technique such as Brent's 
method |2Q» 2^- Otherwise, go back to 1 and 
choose a different range of v. 

In order to test how well the optimization method de- 
scribed above can work in the face of asynchrony, we have 
repeated the simulation in Paper I with several modifi- 
cations. First, we have used 1000 molecules that started 
nucleations asynchronously, following a Gaussian distri- 
bution of average starting time tq = 40 and of starting 
time width tr-r = 10 Second, the size of each individ- 
ual molecule is 10^ instead of 10^. This keeps constant 
the total number of "DNA basepairs" analyzed. 

Since we used the same nucleation rate, the time to 
replicate to / = 0.9 was roughly 100 minutes, about 
the same as for the much larger system [see Fig. |2Id) 
and Fig. OJd)]. We have chosen six timepoints {ti = 
60, 80, 100, 120, 140, 160) at which to collect data, and the 
distributions of fraction / are shown in Fig. ISfb). The 
spread in p{f,ti) reflects the starting time distribution 
0(r). 

We fit I/2v vs. 2vt using I{t) = a + I ■ t in Fig. ^a), 
excluding the last few points roughly above / — 0.9 to 
take into account the finite-size effect (see the following 
section). We then used the fit result to obtain the growth 
rate v by the optimization method given above. The 



results are shown in Fig. O and Table |l| In the plot of 
vs. V [Fig. I^If)], we see a well-defined minimum of x"^ 
at V = 0.453, 10% below the input value 0.5. Fig. E^b) 
and (c) are reconstructions of p{f,ti) and 4>(t) using the 
parameters in Table ^ The minor discrepancies in tq 
and (Tt are acceptable, given the small number of points 
of pif,ti) used in the optimization (20 points in each 
of six histograms). Note that the finite size of sampled 
DNA is responsible for a larger part of the discrepancy 
with the original parameters than was our reconstruction 
algorithm. 

The success of this method depends on the experimen- 
tal design, as well; i.e., one has to choose the right time- 
points ti in order to deduce 0(t) accurately [see Fig.|3Jb) 
and (c)]. The key parameter is the ratio a between the 
replication time scale t* and the starting-time width (7^, 
respectively: a — t* jor. For the case considered here 
{t* « 75 and cr^ w 14), a w 5.4. 

Ideally, a 1 (better synchrony with slow kinetics) so 
that p{f,ti) has a well-defined peak between < / < 1, 
and p{f,ti) — > as / ^ and 1. In this case, even a 
single p{f, ti) can be used to reconstruct ^(r) and extract 
V accurately. For example, each single histograms for all 
timepoints in Fig.|3fb) produced results that are accurate 
to 15%. 

For a <^ 1 (high asynchrony with fast kinetics), p{f, ti) 
is spread over < / < 1. In this case, experimentalists 
should choose at least N = ar/t* timepoints to cover the 
whole range of (/)(t), where well-chosen t^'s spread evenly 
the peaks of p{f, ti) between and 1. 





input 


extracted 


/ 


1 X 10"' 


(0.98 ±0.18) X IQ-'' 


V 


0.5 


0.453 


starting t (to ± (Tt) 


39.6 ± 14.1 


36.5 ±13.9 



TABLE I: Comparison between input and extracted parame- 
ters in the presence of asynchrony (starting t). Note that the 
input To ±(Tt is the Gaussian fit to a sin gle realization of 1000 
molecules, where to = 40 and CTt = 10. |2q| 
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4- Finite-size effects 

As mentioned above, the DNA is broken up into rela- 
tively short segments during the molecular-combing ex- 
periments. In order to estimate how the finite segment 
size affects the estimates of I{t) and v, we have cut the 
simulated molecules in the previous section into smaller 
pieces of equal size [IJj . Fig. O shows results for 
= 1000 and 250, with original size 10"'. As one can see, 
there is a clear correlation between £c and the statistics. 
First, the smaller the segments are, the smaller the aver- 
age domain sizes become as / — > 1. This is as expected, 
since one obviously cannot observe a domain size larger 
than Ic- Note that an underestimate of average eye and 
hole sizes, £i and Ih, leads to an overestimate of the ex- 
tracted I{t), as implied by Eq. [HI Second, as ic becomes 
smaller, the completion times are underestimated. Third, 
the sharp increase (decrease) in average eye (hole) sizes 
disappears, becoming nearly flat at a characteristic frac- 
tion /*, and the kinetic curve f{t) significantly deviates 
from its sigmoidal shape, becoming nearly linear. In fact, 
there is a close relationship between these last two effects. 
The sharp increase in average eye size results from to the 
merger of smaller eyes, which dominates the late stage of 
replication kinetics. Since chopping DNA eliminates the 
large eyes, as shown in Fig. |3^e), it effectively increases 
the number of domains n{t) per unit length in truncated 
segments and overestimates the replication rate. (The 
replication rate df/dt = 2vn.) 

We emphasize that the first two observations above im- 
ply that ic affects the basic time and length scales, t* and 
£* , of the (chopped) systems introduced in the previous 
section. In Figs.2fa)-(c), we re-plot f{t), I{t), and £i and 
£h using the dimensionless axes. One can clearly see that 
the chopping process straightens the sigmoidal f{t) and 
the average domain size curves. Nevertheless, the basic 
shape of I{t) does not change, i.e., curves corresponding 
to different values of £c collapse onto one another, and the 
finite-size effect only makes the up-shooting tails steeper. 

As criteria for significance of finite-size effects, we first 
define a new parameter f3 = £c/£*, namely, the maxi- 
mum average number of domains per chopped molecule 
(around / = 0.5). Then, more careful observation of 
Figs, ^a) and (c) suggests that there might exist a crit- 
ical value (3* (or corresponding chopping size £*), where 
the finite-size effects severely affect the statistics. In 
other words, for (3 > /3* , one can ignore the finite-size 
effects by excluding the last few data points close to 
/ = 1 (Recall that £* is the minimum average eye-to-eye 
spacing). To see this clearly, in Fig. El we have plotted 
t*/t^ vs. P for two different cases: I{t) — 10~^i and 
I{t) = 0.001, where has been calculated using the ba- 
sic kinetic curve f{t) = 1 — expj— 2z; Jq g{t')dt'] (i.e., the 
system is infinitely large) 0, . 

Indeed, changes in t* are very slow above /3 « 10, but 
drop sharply below this ratio. Since P is the average num- 
ber of domains per molecule, we argue that the KJMA 
model can be applied to data directly when there are 
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FIG. 5: The finite-size effects and changes in the basic time 
and length scales. Shown are two different initiation rates 
I{t) = W~^t and I{t) = 0.001. The vertical line is where the 
average number of domains per molecule is 10. The y-axis has 
been normalized relative to the initiation rate for an infinite 
system (/3 — » oo). 



enough eyes in individual molecule fragments (roughly, 
at least 10). On the other hand, when f3 < 10, one would 
require more sophisticated theoretical methods to obtain 
correct statistics. 

One subtle point is that t* , unlike £*, is not very ac- 
cessible experimentally and requires data processing for 
accurate extraction [e.g. Fig.|2d) orFig. E{b)]. 

Finally, we note that the sudden up-shooting in the 
tails of the extracted I{t)/2v vs. 2vt curves are yet an- 
other kind of finite-size effect related to numerical differ- 
entiation (Eq. |SJ|. This can be simply excluded from the 
analysis. 



5. Finite-resolution effect 

Another generic problem is the finite resolution of mea- 
surements. In molecular combing experiments, for exam- 
ple, epifluorescence microscopy is used to scan the fluo- 
rescent tracks of combed DNA on glass slides. The spa- 
tial resolution ('^1 kb) means that smaller domains will 
not be detectable. Thus, two eyes separated by a hole 
of size < 1 kb will be falsely assumed to be one longer 
eye. We evaluate this effect by coarse-graining the statis- 
tics with experimental resolutions Ax*, while keeping 
Aa; = v-dt in simulation much finer. To coarse grain by a 
factor 5 = Ax* /Ax, we have used the raw, "unchopped" 
data set in the previous finite-size-effect section: after 
the simulation, we have scanned the final lists of eyes 
and holes, {i} and {h}, and removed any eyes (holes) for 
S < 1, combining them with the two flanking holes (eyes) 
into a larger hole (eye) that equals the size of all three 
domains. 

In Figs.|nfa)-(c), we show how the statistics change by 
coarse-graining only (i.e., without chopping), where the 
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FIG. 6; (Color online). The effect of coarse-graining, (a) / vs. 2vt. From left to right, Aa;* — 0, 1, 5. (b) I/2v vs. 2vt. 
From top to bottom, the coarse-graining factor Ax* = (no coarse-graining), 1 (comparable to optical resolution), and 5. (c) 
Average domain sizes vs. /. The empty circles are for no coarse-graining, while the dashed lines are for Aa;* = 1 and 5 (dotted 
and dashed, respectively), (d)-(f) Rescaled graphs. 



coarse-graining factors S are 20 and 100. 

The finite-resolution effect biases estimates in a way 
that is opposite to finite-size effects, i.e., converting eyes 
(holes) for (5 < 1 to holes (eyes) increases the average 
domain sizes. As a consequence, the extracted I{t) is 
slightly underestimated. Nevertheless, the curves in each 
of f{t), I{t), and £i and £}i almost perfectly collapse onto 
each other when the axes are rescaled using t* and i*, 
confirming that, as with finite-size effects, the main con- 
sequence is a change in the basic time and length scales 
of the problem [Fig. |a;d)-(f)]. 

To find criteria for significance of finite-resolution ef- 
fects, we recall that coarse-graining falsely eliminates 
eyes and holes smaller than the resolution Ax* only 
{S < 1). For example, statistics for /«0 (small eyes) or 
/wl (small holes) can be affected by coarse-graining. For 
these two cases, however, one can easily avoid a problem 
by excluding data for / « and 1 from analysis. 

On the other hand, a more serious situation can arise 
when 7 = £* / Ax* < 1, because a resolution comparable 
to the minimum eye-to-eye distance will seriously alter 
the mean domain sizes ii and £h and thus the extracted 
I{t), as well. Indeed, for 7 ;» 1, the p{f,tiys remain 
essentially unchanged (i.e., the optimization result for v 
remains the same) even at ^ = 100 (where, 7 w 70) (data 
not shown). We conclude that 7 = 1 is the relevant cri- 
terion to test the significance of finite-resolution effects. 



III. DISCUSSION AND CONCLUSION 

In the previous section, we have tested various generic 
experimental limitations via Monte Carlo simulation. 
When the system is large (10^ for v = 0.5 and I{t) = 
10~^t), we have been able to extract all the input parame- 
ters accurately from a single realization of our simulation. 
As the experimental (simulation) conditions become less 
ideal, however, one requires more sophisticated tools. 

In the presence of asynchrony, we have demonstrated 
that the input parameters can still be extracted to rea- 
sonable accuracy (roughly 10% for a « 5.4) using an opti- 
mization method. In most DNA replication experiments, 
a > I. For example, in the Xenopus egg extracts exper- 
iments of Herrick et al. _13„ ^J, a k, 2.5 {t* ~ 15 mins 
and CTi- ~ 6 mins). In this case, the method presented 
here can even be applied to data p{f, U) for a single well- 
chosen timepoint ti to extract v. The accuracy increases 
as more data are collected for different timepoints. 

The significance of finite-size effects can be estimated 
by the criterion /3 — £*/ic « 10. Fortunately, £* for 
Xenopus sperm chromatin is roughly 10 kb, while the 
typical size of combed molecules ranges between 100 - 
500 kb, thus giving 10 < /3 < 50. However, the origin 
spacing of many higher eukaryotes, including Xenopus 
after the mid-blastula transition, can be as large as 100 
kb. In such cases, it is of critical importance to obtain 
long combed molecules (> 1 Mb). 

Similarly, finite-resolution effects are insignificant 
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when 7 = I* j t^x* > 1. This condition is satisfied in 
almost all molecular-combing experiments of DNA repli- 
cation, since Ax* « 1 kb while £* typically ranges be- 
tween 10 and 100 kb (7 « 10 to 100). 

Among the various experimental limitations we have 
tested, the finite-size effects seem to be potentially the 
most serious problem in the molecular-combing experi- 
ments. Fortunately, we expect the finite-size effects in the 
experiments and analysis of refs. 0,0] to be relatively 
insignificant because P > 10. On the other hand, we need 
more sophisticated theoretical tools to correct the finite- 
size effects for /3 < 10. We recall that the coarse-graining 
of molecules affects the tails in Fig. Elb) opposite to the 
way the finite-size of molecules affects them. We thus 
speculate that an intelligent way of annealing finite-sized 
molecules can reduce or correct the finite-size effects. We 
leave a detailed evaluation of this idea for future work. 

In summary, we have discussed how to apply the 
KJMA model to data to extract kinetic parameters un- 
der various experimental limitations, such as asynchrony, 
finite-size, and finite-resolution effects. For the appli- 
cation to DNA-replication experiments, we have shown 
that finite-size effects can be ignored when the chopped 



molecules contain enough domains (i.e., j3 > 10). Even 
when the size of molecules is smaller than the critical 
value £1, the shape of the nucleation rate I{t) is not af- 
fected when plotted using rescaled parameters. On the 
other hand, finite-resolution effects are insignificant when 
7^1, which is the case for molecular combing experi- 
ments of DNA replication. 

The theoretical understanding of these limitations 
given here should provide guidelines for the design of 
future experiments. 
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